##By: Jason Spector

What is ggplot2

ggplot2 is a data visualization package for the statistical programming language R. Created by Hadley Wickham in 2005 when he was a graduate student at Iowa State, ggplot2 is an implementation of Leland Wilkinson’s Grammar of Graphics—a general scheme for data visualization which breaks up graphs into semantic components. ggplot2

Basic Grammar Defines Components of Graphics

data: in ggplot2, data must be stored as an R data frame
coordinate system: describes 2-D space taht data is projected onto. For example, map projections.
geoms: describe type of geometric objects that represent data. For example, points, lines, polygons
aesthetics: describe visual characteristics that represent data. For example, position, size color, shape, transparency, fill
scales: for each aesthetic, describe how visual characteristic is converted to display values. For example, lkog scales, color scales, size scales, shape scales.
stats: describe statistical transformations that typically summarize data. For example, counts, means, medians, regression lines.
facets: describe how data is split into subsets and displayed as multiple small graphs.

Hints: You need to add multiple layers one by one based on the ground layer.

Data used in the presentation

# Setup
options(scipen=999)  # turn off scientific notation like 1e+06
library(ggplot2)

qbs <- read.csv('nfl_qbs.csv', stringsAsFactors = FALSE)

(qbs_start <- qbs[(qbs$GS > 0 & qbs$Att >= 200), ])
mvp <- qbs_start[which(qbs_start$TD >= median(qbs_start$TD)), ]

##Geometric objects

Geometric objects are elements that we mark on the plot. It can be used in ggplot2 to create a line, bar or box chart. For example:

(geom_point) for scatter plots, (geom_line) for time series, trend lines plots, and (geom_boxplot) for boxplots, (geom_histogram) for histograms, (geom_bar) for bar plots.

To generate the plot area we use aes, standing for aesthetics! Notice no data shows up becasue we have not told our ggplot how to we want this plotted.

# Init ggplot
ggplot(qbs, aes(x=TD, y=G))  # area and poptotal are columns in 'midwest'

##Bar charts

geom_bar() lets you plot bar charts. The theme function can do many things but for now we are using it to angle the text. This already looks like a lot to remember. When in doubt, google it! There are tons of examples on the internet.

ggplot(data = qbs) + geom_bar(mapping = aes(x = Tm)) + theme(axis.text.x = element_text(angle = 90))

On the y-axis count is shown but it is not a variable in the dataset. Many graphs like barcharts, histograms and frequency polygons sort your data by count.

##Positional arguments

Want to make your graphs more colorful! You can color a bar chart using the fill aesthetic.

ggplot(data = qbs) + geom_bar(mapping = aes(x = Tm, fill = Lg)) + theme(axis.text.x = element_text(angle = 90))

You can do much more with colors and types of bars but for now we are going to stick to the simple stuff.

Histogram

Basic histogram (y axis is count)

ggplot(midwest, aes(x = area)) + 
  geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Basic histigram (y axis is density)

ggplot(qbs, aes(x = G)) + 
  geom_histogram(aes(y = ..density..), colour = "black", fill = "dodgerblue") +
  geom_density()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

##Scatter plot

To generate a scatter plot we add geom_point()

ggplot(qbs, aes(x=G, y=TD)) + geom_point()

If we wanted to add a linear regression line to our graph we use geom_smooth with the method being lm. Notice the grey area around the line, that is our confidence interval.

ggplot(qbs, aes(x=G, y=TD)) + geom_point() + geom_smooth(method="lm")
## `geom_smooth()` using formula 'y ~ x'

We can change the confidence interval by adding confidence level arguement to the geom_smooth.

ggplot(qbs, aes(x=G, y=TD)) + geom_point() + geom_smooth(method="lm", level = .75)
## `geom_smooth()` using formula 'y ~ x'

ggplot(qbs, aes(x=G, y=TD)) + geom_point() + geom_smooth(method="lm", level = .75, se = FALSE)
## `geom_smooth()` using formula 'y ~ x'

You can assign a plot to a variable and add to the plot or you can write the whole function.We show this along with creating limits for our graph canvas. Notice that changing the limits will remove values!

g <- ggplot(qbs, aes(x=G, y=TD)) + geom_point() + geom_smooth(method="lm") 

g + xlim(c(10, 16)) + ylim(c(20, 40))  
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 50 rows containing non-finite values (stat_smooth).
## Warning: Removed 50 rows containing missing values (geom_point).

To add labels we use labs function with different arguements!

g + labs(title="Games Vs TDs", subtitle="2019 Season", y="TD", x="G", caption="Quarterback Touchdowns")
## `geom_smooth()` using formula 'y ~ x'

We can add mathmatical equations using the quote arguement.

g +  labs(
    x = quote(sum(x[i] ^ 2, i == 1, n)),
    y = quote(alpha + beta + frac(delta, theta))
  )
## `geom_smooth()` using formula 'y ~ x'

We can change the color of the points with the col arguement. We use col in geom_point to change the colors of the points and we use col in geom_smooth to change the color of the line. For color options, Google ggplot color options! We can also change the size of the points with the size areguement. Lastly, we can add text with the geom_text arguement. Use vjust or hjust to move the texts around. However, our example below shows why this could be difficult to work with when we have big data. Plotly is a good way to get around this!

ggplot(qbs_start, aes(x=G, y=TD)) + 
  geom_point(col="dodgerblue", size=3) +   # Set static color and size for points
  geom_smooth(method="lm", col="red") +    # change the color of line
  geom_text(aes(label=Player, vjust = -.5))
## `geom_smooth()` using formula 'y ~ x'

ggplot(qbs, aes(x=G, y=TD)) + 
  geom_point(col="darkslateblue", size=1) +   # Set static color and size for points
  geom_smooth(method="lm", col="orchid4")  # change the color of line
## `geom_smooth()` using formula 'y ~ x'

When we use a variable to determine the color we put it in the aes arguement. Look at Tm as our example here. You could also put it in geom_point as geom_point(col=Tm). Change the shape with the shape arguement!

ggplot(qbs, aes(x=G, y=TD, col = Tm)) + 
  geom_point(size=3)   # Set static color and size for points

The Brewer palletes are very popular to use. Especially to help control for colorblind people!

library(RColorBrewer)
head(brewer.pal.info, 10) 
#only 8 unique colors in this set
g = ggplot(qbs[1:8, ], aes(x=G, y=TD, col = Tm)) + 
  geom_point(size=3) + scale_colour_brewer(palette = "Set2")

g

Themes can be changed easily by adding a theme arguement at the end!

g + theme_bw() + labs(subtitle="BW Theme")

g + theme_classic() + labs(subtitle="Classic Theme")

g + theme_dark() + labs(subtitle="Dark Theme")

##Facets

You can use facets to put multiple graphs in a single output segmented by a variable. You can make your facets into rows or columns or both (however you cannot use the same variable in row and column to make a matrix or combinations)! you can also do notation or ~

ggplot(mvp[1:5, ], aes(x=G, y=TD, col = Tm)) + 
  geom_point(size=2) + facet_grid(rows = vars(Tm)) + theme(legend.position="None")

ggplot(mvp[1:5, ], aes(x=G, y=TD, col = Tm)) + 
  geom_point(size=2) + facet_grid(cols = vars(Tm)) + theme(legend.position="None")

##Legend Formatting

A couple extra tips to help make your graphs better. We adjust the legend position using theme().

base <- ggplot(mvp, aes(G, TD)) +
  geom_point(aes(colour = Tm))

base + theme(legend.position = "left")

base + theme(legend.position = "top")

base + theme(legend.position = "bottom")

base + theme(legend.position = "right") # the default

##Plotly

We can use the function ggplotly to turn our ggplots into interactive graphs! This works great for html knits but acts as a static ggplot in a word or pdf document.

library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
ggplotly(ggplot(qbs_start, aes(x=Att, y=TD, col=Tm, text=Player)) + geom_point(size=3))